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MULTIVARIATE DATA ANALYSIS METHOD AND USES THEREOF 

RELATED APPLICATIONS 
This application is a continuation-in-part of U.S. Patent Application 
Serial No. 10/293,092 filed November 13, 2002, which claims priority of U.S. 
5 Provisional Patent Application Serial No. 60/338,574 filed November 13, 
2001. These applications are incorporated herein by reference. 

BACKGROUND OF THE INVENTION 
Design of a good information system based on several characteristics is 
an important requirement for successfully carrying out any decision-making 

10 activity. In many cases though a significant amount of information is 
available, we fail to use such information in a meaningful way. As we require 
high quality products in day-to-day life, it is also required to have high quality 
information systems to make robust decisions or predictions. To produce high 
quality products, it is well established that the variability in the processes must 

15 be reduced first. Variability can be accurately measured and reduced only if 
we have a suitable measurement system with appropriate measures. Similarly, 
in the design of information systems, it is essential to develop a measurement 
scale and use appropriate measures to make accurate predictions or decisions. 

Usually, information systems deal with multidimensional characteristics. 

20 A multidimensional system could be an inspection system, a medical diagnosis 
system, a sensor system, a face/voice recognition system (any pattern 
recognition system), credit card/loan approval system, a weather forecasting 
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system or a university admission system. As we encounter these 
multidimensional systems in day-to-day life, it is important to have a 
measurement scale by which degree of abnormality (severity) can be measured 
to take appropriate decisions. In the case of medical diagnosis, the degree of 
5 abnormality refers to the severity of diseases and in the case of credit card/loan 
approval system it refers to the ability to pay back the balance/loan. If we have 
a measurement scale based on the characteristics of multidimensional systems, 
it greatly enhances the decision maker's ability to take judicious decisions. 
While developing a multidimensional measurement scale, it is essential to keep 

10 in mind the following criteria: 1) having a base or reference point to the scale, 
2) validation of the scale, and 3) selection of useful subset of variables with 
suitable measures for future use. 

There are several multivariate methods. These methods are being used 
in multidimensional applications, but still there are incidences of false alarms 

15 in applications like weather forecasting, airbag sensor operation, and medical 
diagnosis. These problems could be because of not having an adequate 
measurement system with suitable measures to determine or predict the degree 
of severity accurately. 

SUMMARY OF THE INVENTION 
20 A process for multivariate data analysis includes the steps of using an 

adjoint matrix to compute a new distance for a data set in a Mahalanobis space. 
The relation of a datum relative to the Mahalanobis space is then determined. 
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A medical diagnosis process includes defining a set of variables relating 
to a patient condition and collecting a data set of the set of variables for a 
normal group. Standardized values of the set of variables of the normal group 
are then computed and used to construct a Mahalanobis space. A distance for 
5 an abnormal value outside the Mahalanobis space is then computed. Important 
variables from the set of variables are identified based on orthogonal arrays and 
signal to noise ratios. Subsequent monitoring of conditions occurs based upon 
the important variables. 

BRIEF DESCRIPTION OF THE DRAWINGS 
10 Figure 1 is a schematic illustrating a multi-dimensional diagnosis 

system of the present invention; 

Figure 2 is a graphical representation of a voice recognition pattern 
according to the present invention parsed into the letter k subsets that 
correspond to k patterns numbered from l,2,..k where each pattern starts at a 
15 low value, reaches a maximum and then again returns to the low value; 

Figure 3 is a graphical representation of MDAs values for normal and 
abnormal values for nine separate data points; 

Figure 4 is a graphical representation of MDA values for normal versus 
abnormal values with important variable usage, for the data of Figure 3; 
20 Figure 5 is a graphical representation of Gram-Schmidt predicted values 

as a function of variable number compared with assigned values for a 
seventeen variable test set; and 
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Figure 6 is a graphical representation of Gram-Schmidt predicted values 
as a function of variable number compared with assigned values for a nineteen 
variable test set including two variables with zero standard deviation. 

DETAILED DESCRIPTION OF THE INVENTION 
The inventive method helps develop multidimensional measurement 
scale by integrating mathematical and statistical concepts such as Mahalanobis 
distance and Gram-Schmidt's orthogonalization method, with the principles of 
quality engineering or Taguchi Methods. 

The selection of unit group (Mahalanobis group) is the most important 
aspect of MTS and its related methods. Every individual observation in this 
group has a unique pattern. Since the conditions of the observations are 
measured from this group, it is desirable that observations within this group be 
as uniform as possible. From this group, the distances (of observations outside 
of this group) are measured to perform the diagnosis. These distances, which 
are similar to the Mahalanobis distance, indicate the degree of severities of 
individual observations. A group of observations is needed (as in the case of 
the reference group) to measure distances because with one observation a 
correlation structure cannot be obtained. It should be noted that the correlation 
matrix corresponding to this reference group is also used to measure distances 
outside of this group. In MTS, S/N ratios are calculated based on the 
observations that are outside of the unit space. 

In MTS and its related methods, the diagnosis is performed after 
validating the scale with variables defining the multidimensional system. The 
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validation is done with observations outside of unit group by computing S/N 

ratios. S/N ratio is the measure of correlation between "input signal" and 

"output" of the system. If there is a good correlation (higher S/N ratio), then 

the scale is useful for diagnosis. 

5 One of the main objectives of the present invention is to introduce a 

scale based on all input characteristics to measure the degree of abnormality. 

In the case of medical diagnosis, for example, the aim is to measure the degree 

of severity of each disease based on this scale. To construct such a scale, 

Mahalanobis distance (MD) is used. MD is a squared distance (also denoted as 

10 D 2 ) and is calculated for j th observation, in a sample of size n with k variables, 

by using the following formula: 

MDj = Dj 2 = (1/k) Zij C" 1 Z'ij (1) 

Where, j = 1 to n 

Zy = (z^, z 2 j,..,z k j) 

15 = standardized vector obtained by standardized values of Xij 

(i = l..k) 
Zy = (Xij-mi)/si 

Xy = value of i th characteristic in observation 
mi = mean of i th characteristic 
20 sj = s.d. of i th characteristic 

k = number of characteristics/variables 

1 = transpose of the vector 

C" 1 = inverse of the correlation matrix 

There is also an alternate way to compute MD values using Gram- 

25 Schmidt's orthogonalization process. It can be seen that MD in Equation (1) is 

obtained by scaling, that is by dividing with k, the original Mahalanobis 

distance. MD can be considered as the mean square deviation (MSD) in 

multidimensional spaces. The present invention focuses on constructing a 
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normal group, or in the application of medical diagnosis a healthy group, from 

a data population, called Mahalanobis Space (MS). Defining the normal group 

or MS is the choice of a specialist conducting the data analysis. In case of 

medical diagnosis, the MS is constructed only for the people who are healthy 

5 and in case of manufacturing inspection system, the MS is constructed for high 

quality products. Thus, MS is a database for the normal group consisting of the 

following quantities: 

mi = mean vector 
Si = standard deviation vector 
10 C= correlation matrix. 

Since MD values are used to define the normal group, this group is 

designated as the Mahalanobis Space. It can be easily shown, with 

standardized values, that MS has zero point as the mean vector and the average 

MD as unity. Because the average MD of MS is unity, MS is also called as the 

15 unit space. The zero point and the unit distance are used as reference point for 
the scale of normalcy relating to inclusion of a subject within MS. This scale is 
often operative in identifying the conditions outside the Mahalanobis Space. In 
order to validate the accuracy of the scale, different kinds of known conditions 
outside MS are used. If the scale is good, these conditions should have MDs 

20 that match with decision maker's judgment. In this application, the conditions 
outside MS are not considered as a separate group (population) because the 
occurrence of these conditions are unique, for example a patient may be 
abnormal because of high blood pressure or because of high sugar content. 
Because of this reason, the same correlation matrix of the MS is used to 
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compute the MD values of each abnormal. MD of an abnormal point is the 
distance of that point from the center point of MS. 

In the next phase of the invention, orthogonal arrays (OAs) and signal- 
to-noise (S/N) ratios are used to choose the relevant variables. There are 
5 different kinds of S/N ratios depending on the prior knowledge about the 
severity of the abnormals. 

A typical multidimensional system used in the present invention is as 
shown in Figure 1, where Xi,X 2 ,..,X n correspond to the variables that provide a 
set of information to make a decision. Using these variables, MS is 

10 constructed for the healthy or normal group, which becomes the reference point 
for the measurement scale. After constructing the MS, the measurement scale 
is validated by considering the conditions outside MS. These outside 
conditions are typically checked with the given input signals and in the 
presence of noise factors (if any). If the noise factors are present, a correct 

15 decision has to be made about the state of the system. In the context of 
multivariate diagnosis system, it would be appropriate to consider two types of 
noise conditions. They are 1) active noise and 2) criminal noise. Example for 
active noise condition is change in usage environment such as conditions in 
different manufacturing environments or different hospitals and the example 

20 for criminal noise conditions are unexpected conditions such as terrorist attacks 
on 11 September 2001 in which the system is operating. It is important to 
design multivariate information systems considering these two types of noise 
conditions. In Figure 1, the input signal is the true value of the state of the 
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system, if known. The output (MD) should have a good correlation with the 
true state of the system (input signal).. In most applications, it is not easy to 
obtain the true states of the system. In such cases, the working averages of the 
different classes, where the classes correspond to the different degrees of 
5 severity, can be considered as the input signals. 

After validating the measurement scale, OAs and S/N ratios are used to 
identify the variables of importance. OAs are used to minimize the number of 
variable combinations to be tested. The variables are allocated to the columns 
of the array. In MTS analysis only two level OAs are used as there are only 
10 two levels for the variables - presence and absence.. To identify the variables 
of importance, S/N ratios are used. 

The inventive process can illustratively be applied to a 
multidimensional system in four stages. The steps in each exemplary stage are 
listed below: 

15 Stage I: Construction of a Measurement Scale with Mahalanobis Space 
(Unit Space) as the Reference 

• Define the variables that determine the healthiness of a condition. For 
example, in medical diagnosis application, the doctor has to consider the 
variables of all diseases to define a healthy group. In general, for pattern 

20 recognition applications, the term "healthiness" must be defined with 

respect to "reference pattern". 

• Collect the data on all the variables from the healthy group. 

• Compute the standardized values of the variables of the healthy group. 
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• Compute MDs of all observations. With these MDs, the zero point and the 
unit distance are defined. 

• Use the zero point and the unit distance as the reference point or base for 
the measurement scale. 

Stage II: Validation of the Measurement Scale 

• Identify the abnormal conditions. In medical diagnosis applications, the 
abnormal conditions refer to the patients having different kinds of diseases. 
In fact, to validate the scale, any condition outside MS is chosen. 

• Compute the MDs corresponding to these abnormal conditions to validate 
the scale. The variables in the abnormal conditions are normalized by 
using the mean and s.d.s of the corresponding variables in the healthy 
group. The correlation matrix or set of Gram-Schmidt's coefficients, if 
Gram-Schmidt's method is used, corresponding to the healthy group is 
used for finding the MDs of abnormal conditions. 

• If the scale is good, the MDs corresponding to the abnormal conditions 
should have higher values. In this way the scale is validated. In other 
words, the MDs of conditions outside MS must match with judgment. 

Stage III: Identify the Useful Variables (Developing Stage) 

• Find out the useful set of variables using orthogonal arrays (OAs) and S/N 
ratios. S/N ratio, obtained from the abnormal MDs, is used as the response 
for each combination of OA. The useful set of variables is obtained by 
evaluating the "gain" in S/N ratio. 



9 



ASI-10003/03 
31209gs 

Stage IV: Future Diagnosis with Useful Variables 

Monitor the conditions using the scale, which is developed with the 
help of the useful set of variables. Based on the values of MDs, appropriate 
corrective actions can be taken. The decision to take the necessary actions 
depends on the value of the threshold. 

In case of medical diagnosis application, above steps have to be 
performed for each kind of disease in the subsequent phases of diagnosis. It is 
appreciated that many additional applications for the present invention exist as 
illustratively recited in "The Mahalanobis Taguchi Strategy - A Pattern 
Technology System" by G. Taguchi and R. Jugulum, John-Wiley, 2002 and in 
"The Mahalanobis Taguchi System" by G. Taguchi et al., McGraw-Hill, 2001. 

According to the present invention, an adjoint matrix method is used to 
calculate MD values. 

If A is a square matrix, the inverse can be computed for square matrices 
only, then its inverse A" 1 is given as: 

A' 1 = (1/det. A) A adj (2) 

Where, 

Aadj is called adjoint matrix of A. Adjoint matrix is transpose of cofactor 
matrix, which is obtained by cofactors of all the elements of matrix A, 
det. A is called determinant of the matrix A. The determinant is a 
characteristic number (scalar) associated with a square matrix. A matrix is said 
to be singular if its determinant is zero. 
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As mentioned before, the determinant is a characteristic number 
associated with a square matrix. The importance of determinant can be 
realized when solving a system of linear equations using matrix algebra. The 
solution to the system of equations contains inverse matrix term, which is 
5 obtained by dividing the adjoint matrix by determinant. If the determinant is 
zero then, the solution does not exist. 

Considering a 2 x 2 matrix as shown below: 

A= Uu ai2 l 

L a 21 a 2l\ 

The determinant of this matrix is an a 22 -a J2 a 2 i. 
10 Considering a 3 x 3 matrix as shown below: 







a \2 


a l3 


A = 


a 2i 


a 22 


a 23 




a 3l 


a n 


a 33 _ 



The determinant of A can be calculated as: 
det. A= aiiAn + ai 2 Ai2+ anAn 
Where, 

15 An = (a22 a33-a 2 3a 32 ); A X2 = - (a 2 i a 33 -a 2 3a3i); A i3 = (a 2 ia 32 -a 22 a 3 i) are called as 
cofactors of the elements an,ai 2 , and a J3 of matrix A respectively. Along a row 
or a column, the cofactors will have alternate plus and minus sign with the first 
cofactor having a positive sign. 

The above equation is obtained by using the elements of the first row 

20 and the sub matrices obtained by deleting the rows and columns passing 
through these elements. The same value of determinant can be obtained by 
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10 



15 



20 



using other rows or any column of the matrix. In general, the determinant of a 
n x n square matrix can be written as: 

det. A = aii Aii + a i2 A i2 + ...+ ai n A in along any row index i, where, i = l,2,..,n 
or 

det. A = aijAy + a 2 jA 2 j + ...+ a n jA n j along any column index j, where, j = 

1,2,.. ,n 

Cofactor 

From the above discussion, it is clear that the cofactor of Aij of an 
element a^ is the factor remaining after the element a^ is factored out. The 
method of computing the co-factors is explained above for a 3 x 3 matrix. 
Along a row or a column the cofactors will have alternate signs of positive and 
negative with the first cofactor having a positive sign. 
Adjoint matrix of a square matrix 

The adjoint of a square matrix A is obtained by replacing each element 
of A with its own cofactor and transposing the result. 

Considering a 3 x 3 matrix as shown below: 





a u 


a l2 


a n 


A = 










a 3l 


a 32 


a 33 _ 



The cofactor matrix containing cofactors (Ays) of the elements of the above 
matrix can be written as: 



a 



13 



a 2\ a 22 a 23 



L a 3i 



a 32 a 32 
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The adjoint of the matrix A, which is obtained by transposing the cofactor 
matrix, can be written as: 





a u 


«21 


a 3i 


Adj.A = 


a n 




a 32 




a 13 


a 23 


a 33 



Inverse Matrix 

5 The inverse of matrix A (denoted as A" 1 ) can be obtained by dividing 

the elements of its adjoint by the determinant. 
Singular and Non-Singular Matrices 

If the determinant of a square matrix is zero then, it is called a singular 
matrix. Otherwise, the matrix is known as non-singular. 
10 The present invention is applied to solve a number of longstanding data 

analysis problems. These are exemplified as follows. 
Multi-collinearity problems 

Multi-collinearity problems arise out of strong correlations. When 
there are strong correlations, the determinant of correlation matrix tends to 
15 become zero thereby making the matrix singular. In such cases, the inverse 
matrix will be inaccurate or cannot be computed (because determinant term is 
in the denominator of Equation (2)). As a result, scaled MDs will also be 
inaccurate or cannot be computed. Such problems can be avoided if we use a 
matrix form, which is not affected by determinant term. From Equation (2), it 
20 is clear that adjoint matrix satisfies this requirement. 

MD values in MTS method are computed by using inverse of the 
correlation matrix (C"\ where C is correlation matrix). In the present 

13 



ASI-10003/03 
31209gs 



invention, the adjoint matrix is used to calculate the distances. If MDA 
denotes the distances obtained from adjoint matrix method, then equation for 
MDA can be written as: 

MDAj =(l/k) Zy Cadj Zjj ' (3) 

Where, j = 1 to n 

Zij = (zy, z 2 j,..,z k j) 

= standardized vector obtained by standardized values of Xy 

(i=l..k) 
Zy = (Xij-mi)/Si; 

Xy = value of i th characteristic in observation 

mi = mean of i th characteristic 

Si = s.d. of i th characteristic 

k= number of characteristics/variables 

' = transpose of the vector 

Cadj = adjoint of the correlation matrix. 

The relationship between the conventional MD and the MDAs in 
Equation (3) can be written as: 

MDj =( 1/det.C) MDAj (4) 

Thus, an MDA value is similar to a MD value with different properties, 
that is, the average MDA is not unity. Like in the case of MD values, MDA 
values represent the distances from the normal group and can be used to 
measure the degree of abnormalities. In adjoint matrix method also, the 
Mahalanobis space contains means, standard deviations and correlation 
structure of the normal or healthy group. Here, the Mahalanobis space cannot 
be called as unit space since the average of MDAs is not unity. 
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^-adjustment method 

The present invention has applications in multivariate analysis in the 
presence of small correlation coefficients in correlation matrix. When there are 
small correlation coefficients, the adjustment factor /?is calculated as follows. 
J3 = 0 if r < iNn 



if r>\Nn 



(5) 



where r is correlation coefficient and n is sample size. 

10 After computing /?, the elements of the correlation matrix are adjusted 

by multiplying them with /?. This adjusted matrix is used to carry out MTS 
analysis or analysis with adjoint matrix. 

To explain the applicability of /^adjustment method, Dr. Kanetaka's 
data on liver disease testing is used. The data contains observations of healthy 

15 group as well as of the conditions outside Mahalanobis space (MS). The 
healthy group (MS) is constructed based on observations on 200 people, who 
do not have any health problems. There are 17 abnormal conditions. This 
example is chosen since the correlation matrix in this case contains a few small 
correlation coefficients. The corresponding /^-adjusted correlation matrix 

20 (using Equation (5)) is as shown in Table 1. 
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With this matrix, MTS analysis is carried out with dynamic S/N ratio 
analysis and as a result the following useful variable combination was 
obtained: X4-X5- X7-X10-X12-X13-X14-X15-X16-X17. This combination is very 
similar to the useful variable set obtained without /^-adjustment; the only 
5 difference is presence of variables X 7 and Xi 6 . 

With this useful variable set, S/N ratio analysis is carried out to 
measure improvement in overall system performance. From the Table 2, which 
shows system performance in the form of S/N ratios, it is clear that there is a 
gain of 0.91 dB units if useful variables are used instead of entire set of 
10 variables. 

Table 2: S/N Ratio Analysis (^adjustment method) 



S/N ratio-optimal system 


43.81 dB 


S/N ratio-original system 


42.90 dB 


Gain 


0.91 dB 



In an alternate embodiment of the present invention, a Mahalanobis 
distance is computed using a Gram-Schmidt orthogonalization process (GSP). 
GSP is often a more robust and sample size insensitive orthogonalization 

15 process. Like in MTS, using the inventive MTGS method, the coefficients of 
orthogonal expansion of unit group are also used to predict the conditions 
outside this group. The usefulness of this space is tested with signal to noise 
ratios, like control factors are tested in hardware design. According to the 
Gram-Schmidt process, original variables are converted to orthogonal and 

20 independent variables. The Gram-Schmidt orthogonalization process is 

17 
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particularly well suited to identify the direction of abnormals. While 
measuring the degree of abnormality of a given value, a longer distance 
corresponds to higher degree of severity. In some instances, such as stock 
performance or financial market predictions, longer distance can represent 
favorable situations if the normal space is constructed based on companies with 
average performance. In such an instance, both underperforming and 
outperforming companies will have longer distances. Distinguishment of these 
diametrically abnormal situations is preferably performed with the Gram- 
Schmidt orthogonalization process (GSP). 

The GSP operates on a set of given linearly independent vectors Z\, Z2, 
.. Z k , to determine a corresponding set of mutually perpendicular vectors Ui, 
U2, .. Uk with the same linear span as shown in Equation (6). 



Zi, Z2, .., z k 
(original vectors) 


► 


GSP 


— ► 


Ui, u 2 , .., u =k 

(orthogonal vectors) 



(6) 



The Gram-Schmidt's vectors are constructed sequentially by setting up 
Equations (7). 

U,= Z, 

U 2 = Z 2 -((Z' 2 U.)/ (U',U,))U, 

'. (7) 

U k = Z k -((Z' k U,)/ (U , ,U 1 ))U,-...-((Z , k Uk-i)/ (U' k -iU k .,))U k ., 
Where, ' denotes a vector transpose. While calculating MD using GSP, 
standardized values of the variables are used. Therefore, in the above set of 
Equations (7), Zi, Z 2 , .. Z k correspond to standardized values. 

18 
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Calculation of MD Using Gram-Schmidt Process (GSP) 

Beginning with a sample of size n, where each sample contains 
observations on k variables. After standardizing the variables, a set of 
standardized vectors is obtained. Let these vectors be: 

Zi = (Zn, Z12, Zin) 

Z 2 = (z 2 i, z 2 2, z 2n ) 

(8) 

Z k = (z k l,Z k2 , .. ,Zkn) 

After performing GSP, the orthogonal vectors are as follows: 

Ui = (un, ui 2 , .., Um) 
U 2 = (u 2 i, u 22 , .., u 2n ) 

(9) 

U k = (u k) ,u k2 , ,u kn ) 
It is easily shown that mean of vectors Ui,U 2 ,..,U k is zero. Let sj,s 2 ,..s k 
be standard deviations (s.d.s) of Ui,U 2 ,..,U k respectively. Since the sample of 
size is n, there are n different MDs. MD corresponding to j th observation of the 
sample is computed using Equation (10). 

MDj = (1/k) [(u^/s! 2 ) + (u 2j 2 /s 2 2 ) +..+ (u kj 2 /s k 2 )] (10) 
Where, j = l..n, the values of MD obtained from Equations (1) and (10) are 
exactly the same. In MTGS methodology, abnormal MDs are computed from 
the means, standard deviations and Gram-Schmidt coefficients of the normal 
group or Mahalanobis space, while the Mahalanobis space is a database 
including means, standard deviations, Gram-Schmidt coefficients and the 
Mahalanobis distances. 
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Predictions Based on Gram-Schmidt Variables 

According to the present invention, a method of making predictions 
using Gram-Schmidt (GS) variables without calculating the Mahalanobis 
distance is provided. This method is useful in situations where the reference 
5 group consists of the variables with small or even zero standard deviation or 
variance. In the most extreme case where if variables have zero standard 
deviations then correlations with other variables are not possible and hence 
calculation of Mahalanobis distances is not possible, although variables with 
zero standard deviations represent very important patterns. This type of 
10 situation is frequently seen in pattern recognition problems. 

The method of making predictions according to one embodiment of the 
present invention is described in the following steps: 

1) Subtract mean vector from all observations in the normal group. Let 
Xi,X2,..,X k denote original vectors and Li,L2,..,L k denote the vectors 

15 that are obtained after subtracting the mean vector. 

2) Conduct GSP on Li,L 2 ,..,L k . If some variables have zero variance or 
synonymously, zero standard deviation then these variables will be 
zeroes after subtracting original values from respective means. In such 
situations these zero vectors also are used as GS vectors because, they 

20 will be orthogonal to any other vector. Let Ui,U 2 ,..,U k denote Gram- 

Schmidt vectors corresponding to Li,L 2 ,..,L k . Here, the reference group 
consists of means and coefficients of Gram-Schmidt vectors. 

20 
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3) Obtain Gram-Schmidt vectors corresponding to the observations 
outside the reference group by using means and Gram-Schmidt 
coefficients of the reference group. 

4) Compute dynamic S/N ratios for Gram-Schmidt variables (Ui,U2,..,U k ) 
using values of severity of the conditions (observations) as input 
signals. The severity of conditions can be actual values or optionally, 
assigned values. The procedure for computing S/N ratios is as follows: 

If Mi, M 2 ,..,M t represent the true levels of severity (input 
signals) corresponding to t abnormals, the relationship between 
the input signal (MjS) and the j m variable (Uys) is given by the 
following equation: 

Uij^PjMi i=l,..,t;j=l..k ► (11) 

and 3j is the linear slope of relation between Uy and M» 

Then calculate following quantities, 

S T = Total Sum of Squares = £Uy2 

' 2 

r = Sum of squares due to input signal = 7 M. 

1=1 

t 2 

S p = Sum of Squares due to Slope = (1/r) [^M. IL] 

S e = Error Sum of Squares = S T - S p 

V e = Error Variance = S e / (t-1) 

The linear slope, for j th variable is given by: 
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p j= [£M.Uij]/r 



(12) 



The S/N ratio, T|j , corresponding j variable is given by, 



(13) 



5) After computing r|j and Pj for each Gram-Schmidt variable calculate 
predicted values of abnormals. The predicted value of i th abnormal 
condition is obtained as follows: 



where, i=l 9 ..,t and Uy is Gram-Schmidt element corresponding to j 
variable in i th condition. 
6) If there is a good correlation between the predicted values and actual 
values then Equation (14) is useful for future predictions. Again here, 
we can use S/N ratio to examine the accuracy of the prediction, that is, 
the correlation between predicted values and actual values. 

Multiple Mahalanobis distance 

Selection of suitable subsets is very important in multivariate 
diagnosis/pattern recognition activities as it is difficult to handle large datasets 
with several numbers of variables. The present invention applies a new metric 
called Multiple Mahalanobis Distance (MMD) for computing S/N ratios to 
select suitable subsets. This method is useful in complex situations, 
illustratively including voice recognition or TV picture recognition. In these 




2> 



(14) 
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cases, the number of variables runs into the order of several thousands. Use of 
MMD method helps in reducing the problem complexity and to make effective 
decisions in complex situations. 

In MMD method, large number of variables is divided into several 
subsets containing local variables. For example, in a voice recognition pattern 
(as shown in Figure 2), let there be k subsets. The subsets correspond to k 
patterns numbered from 1,2, ..k. Each pattern starts at a low value, reaches a 
maximum and then again returns to the low value. These patterns (subsets) are 
described by a set of respective local variables. In MMD method, for each 
subset the Mahalanobis distances are calculated. These Mahalanobis distances 
are used to calculate MMD. Using abnormal MMDs, S/N ratios are calculated 
to determine useful subsets. In this way the complexity of the problems is 
reduced. 

This method is also useful for identifying the subsets (or variables in 
the subsets) corresponding to different failure modes or patterns that are 
responsible for higher values of MDs. For example in the case of final product 
inspection system, use of MMD method would help to find out variables 
corresponding to different processes that are responsible for product failure. 

If the variables corresponding to different subsets or processes cannot 
be identified then, decision-maker can select subsets from the original set of 
variables and identify the best subsets required. 
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Exemplary Steps in Inventive Process 

1. Define subsets from original set of variables. The subsets may contain 
variables corresponding to different patterns or failure modes. These 
variables can also be based on decision maker's discretion. The 
number of variables in the subsets need not be the same. 

2. For each subset, calculate MDs (for normals and abnormals) using 
respective variables in them. 

3. Compute square root of these MDs (VMDs). 

4. Consider the subsets as variables (control factors). The VMDs would 
provide required data for these subsets. If there are k subsets then, the 
problem is similar to MTS problem with k variables. The number of 
normals and abnormals will be same as in the original problem. The 
analysis with VMDs is exactly similar to that of MTS method with 
original variables. The new Mahalanobis distance obtained based on 
square root of MDs is referred to as Multiple Mahalanobis Distance 
(MMD). 

5. With the MMDs, S/N ratios are obtained for each run of an orthogonal 
array. Based on gains in S/N ratios, the important subsets are selected. 

Example 1 

The adjoint matrix method is applied to liver disease test data 
considered earlier. For the purpose of better understanding of the discussion, 
correlation matrix, inverse matrix and adjoint matrix corresponding to the 17 
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variables are given in Tables 3, 4, and 5 respectively. In this case the 
determinant of the correlation matrix is 0.00001314. 

The Mahalanobis distances calculated by inverse matrix method and 
adjoint matrix method (MDAs), are given in Table 6 (for normal group) and in 
5 Table 7 (for abnormal group). From the Table 6, it is clear that the average 
MDAs for normals do not converge to 1.0. MDAs and MDs are related 
according to the Equation (4). 
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L 32 (2 31 ) OA is used to accommodate 17 variables. Table 8 gives 
dynamic S/N ratios for all the combinations of this array with inverse matrix 
method and adjoint matrix method. Table 9 shows gain in S/N ratios for both 
the methods. It is clear that gains in S/N ratios are same for both methods. 
The important variable combination based on these gains is: X4-X5-X10-X12- 
X13-X14-X15-X17. From Table 10, which shows system performance in the 
form of S/N ratios, it is clear that there is a gain of 1.98 dB units if useful 
variables are used instead of all the variables. This gain is also exactly same as 
that obtained in inverse matrix method. 

Hence, even if an adjoint matrix method is used, the ultimate results 
would be the same. However, MDA values are advantageous because it will 
not take into account the determinant of correlation matrix. In case of multi- 
collinearity problems, as the determinant tend to become zero, the inverse 
matrix becomes inefficient giving rise to inaccurate MDs. Such problems can 
be avoided if MDAs are used based on adjoint matrix method. 
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Table 8: Dynamic S/N ratios for the combinations of L 32 (2 31 ) array 



Run 


S/N ratio (Inverse) 


S/N ratio (Adjoint) 


1 


-6.252 


42.560 


2 


-6.119 


42.693 


3 


-10.024 


38.788 


4 


-10.181 


38.631 


5 


-10.348 


38.464 


6 


-10.495 


38.317 


7 


-7.934 


40.878 


8 


-8.177 


40.635 


9 


-9.234 


39.578 


10 


-9.631 


39.181 


11 


-3.338 


45.474 


12 


-3.406 


45.406 


13 


-10.932 


37.880 


14 


-11.121 


37.691 


15 


-6.495 


42.317 


16 


-7.265 


41 .547 


17 


-7.898 


40.914 


18 


-7.665 


41.147 


19 


-10.156 


38.656 


20 


-9.901 


38.911 


21 


-5.431 


43.381 


22 


-5.312 


43.500 


23 


-7.603 


41 .209 


24 


-7.498 


41.314 


25 


-11.412 


37.400 


26 


-11.100 


37.712 


27 


-5.874 


42.938 


28 


-4.989 


43.823 


29 


-9.238 


39.574 


30 


-8.989 


39.823 


31 


-5.544 


43.268 


32 


-5.303 


43.509 
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Table 9: Gain in S/N Ratios 



1 „. ft A~.-k.L~. — -J 

Inverse Method 










Adjoint Method 








vcuicujie 


L.cvt?l 1 


Level ^ 


\JcUl I 




VCll ICUJIC/ 




LCvCI ^ 


\JKMI 1 


X< 


-8.185 


-7.746 


-0.440 




Xi 


40.627 


41.067 


-0.440 


Xo 

✓So. 


-8.187 


-7.742 


-0.445 




Xo 


40.625 


41 .070 


-0.445 


Xo 


-8.249 


-7.680 


-0.569 




Xo 


40.563 


41.132 


-0.569 


X>, 


-7 949 


-7 980 


0 031 




X, 
/ M 


40.863 


40.832 


0.031 


Xr 


-7.069 


-8.860 


1.791 




x= 


41.743 


39.952 


1.791 




-ft ^1ft 


-7 fill 






Xo 


40.494 


41.201 


-0.706 


X 7 


-7.976 


-7.954 


-0.022 




x 7 


40.836 


40.858 


-0.022 


Xs 


-8.824 


-7.105 


-1.718 




Xs 


39.988 


41.707 


-1.718 


X3 


-8.188 


-7.742 


-0.446 






40.625 


41.070 


-0.446 




-6.358 


-9.571 


3.212 






42.454 


39.241 


3.212 


Xn 


-8.101 


-7.828 


-0.273 




Xn 


40.711 


40.984 


-0.273 


x 12 


-7.821 


-8.108 


0.287 




^12 


40.991 


40.704 


0.287 




-7.562 


-8.367 


0.805 




X13 


41.250 


40.445 


0.805 


x 14 


-7.315 


-8.615 


1.300 




X14 


41.497 


40.197 


1.300 


X15 


-7.590 


-8.339 


0.749 




X15 


41.222 


40.473 


0.749 




-7.982 


-7.947 


-0.035 






40.830 


40.865 


-0.035 


X17 


-7.832 


-8.097 


0.265 




x 17 


40.980 


40.715 


0.265 



Table 10: S/N Ratio Analysis 



S/N ratio-optimal system 


44.54 dB 


S/N ratio-original system 


42.56 dB 


Gain 


1.98 dB 



Example 2 

5 The adjoint matrix method is applied to another case with 12 variables. 

In this example, there are 58 normals and 30 abnormals. The MDs 
corresponding to normals are computed by using MTS method - the average 
MD is 0.92. The reason for this discrepancy is the existence of multi- 
collinearity. This is clear from the correlation matrix (Table 11), which shows 
10 that the variables Xio, Xn and X x2 have high correlations with each other. The 
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1 2 

determinant of the matrix is also estimated and it is found to be 8.693x10" 
(close to zero), indicating that the matrix is almost singular. Presence of multi- 
collinearity will also affect the other stages of the MTS method. Hence, adjoint 
matrix method is used to perform the analysis. 
5 Adjoint Matrix Method 

The adjoint of correlation matrix is shown in Table 12. 
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After computing MDA values for normals, the measurement scale is 
validated by computing abnormal MDA values. Figure 3 indicates that there is 
a clear distinction between normals and abnormals. 

In the next step, important variables are selected using Li6(2 15 ) array. 
The S/N ratio analysis was performed based on larger-the-better criterion in 
usual way. The gains in S/N ratios are shown in Table 13. From this table, it 
is clear that the variables Xi-X 2 -X 3 - X4- X 6 - Xio-X n -Xi 2 have positive gains 
and hence they are important. The confirmation run with these variables 
(Figure 4) indicates that distinction (between normals and abnormals) is very 
good. 

Table 13: Gain in S/N ratio 



Variable 


Level 1 


Level 2 


Gain 


Xi 


-102.90 


-105.01 


2.12 


x 2 


-103.53 


-104.38 


0.86 


x 3 


-103.84 


-104.07 


0.22 


x 4 


-103.72 


-104.19 


0.47 


x 5 


-104.04 


-103.86 


-0.18 


x 6 


-103.87 


-104.04 


0.16 


x 7 


-104.18 


-103.72 


-0.46 


x 8 


-104.14 


-103.77 


-0.37 


x 9 


-104.33 


-103.58 


-0.76 


X10 


-103.51 


-104.40 


0.90 


Xn 


-103.78 


-104.13 


0.35 


X12 


-103.43 


-104.48 


1.05 



Therefore, adjoint matrix method can safely replace inverse matrix 
method as it is as efficient as inverse matrix method in general and more 
efficient when there are problems of multi-collinearity. 
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Example 3 

From the 17 variables, eight subsets (as shown in Table 14) are 
selected. These subsets are selected to illustrate the MMD methodology; there 
is no rational for this selection. It is to be noted that the number of variables in 
each subset are not the same. 
Table 14: Subsets for MMD analysis 



Subset 


Variables 


Si 


Xi -X2-X3-X4 


s 2 


Xs-Xg-XyXg 


s 3 


X9-X-|o"X-|i-Xi2 


s 4 


Xi 3"Xi 4"Xt 5 -X 1 q_X<i 7 


s 5 


X3-X4-X5-X5 


s 6 


Xi 0"Xi 1 -X-j 2'X! 3 -X 1 4-Xt 5 


s 7 


Xi4"Xi5-X 16 -X 17 


s 8 


Xa-Xs-Xy-X! 0 -Xi 2-Xt 3 -X 1 4 -X 1 5 



For each subset, Mahalanobis distances are computed with the help of 
correlation matrices of respective variables. Therefore, we have eight sets of 
MDs (for normals and abnormals) corresponding to the subsets. The VMDs 
provide data corresponding to the subsets that are considered as control factors. 
Tables 15 and 16 show sample data (VMDs) for normals and abnormals. 
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Table 15: VMDs for normals (sample data) 



S.No 


s. 


s 2 


! s 3 


s 4 


s 5 


s 6 


s 7 


s 8 


1 


0.873 


0.545 


! 0.707 


0.756 


0.796 


0.505 


0.832 


0.574 


2 


0.762 


0.540 


i 0.929 


0.710 


0.499 


0.688 


0.606 


0.807 


3 


1.022 


0.688 


0.550 


0.623 


0.955 


0.479 


0.697 


0.613 


4 


1.102 


0.544 


0.769 


0.740 


1.225 


0.648 


0.827 


0.681 


5 


1.022 


0.640 


0.602 


0.888 


0.815 


0.782 


0.934 


0.695 




















196 


1.041 


0.786 


1.691 


1.513 


0.500 


1.550 


1.539 


1.411 


197 


1.467 


1.310 


2.101 


1.201 


1.457 


1.481 


0.611 


1.373 


198 


1.086 


1.278 


0.974 


1.406 


1.410 


1.834 


0.994 


1.648 


199 


1.238 


0.999 


1.107 


1.061 


1.206 


1.132 


0.964 


1.700 


200 


1.391 


0.924 


0.979 


0.680 


1.094 


2.156 


0.750 


1.844 



Table 16: VMDs for abnormals (sample data) 



S.No 


Si 


s 2 


s 3 


s 4 


s 5 


s 6 


s 7 


s 8 


1 


1.339 


2.930 


2.610 


3.428 


2.574 


3.277 


2.913 


3.734 


2 


1.491 


3.469 


1.931 


1.511 


3.267 


3.388 


1.687 


3.932 


3 


1.251 


2.700 


0.742 


2.631 


2.447 


3.322 


2.660 


4.365 


4 


2.124 


2.507 


2.041 


3.240 


2.518 


3.058 


2.009 


3.395 


5 


1.010 


2.182 


2.867 


1.279 


1.861 


4.035 


1.090 


4.440 




















13 


1.769 


2.819 


6.544 


2.153 


2.352 


6.023 


2.177 


5.776 


14 


1.898 


2.045 


3.817 


4.551 


2.443 


10.213 


1.969 


9.275 


15 


1.624 


12.681 


2.116 


3.672 


12.248 


9.064 


1.202 


1 1 .426 


16 


5.453 


13.314 


3.630 


1.022 


13.515 


10.095 


1.108 


12.121 


17 


4.511 


16.425 


5.489 


3.684 


12.027 


11.142 


2.264 


10.939 



After arranging the data (VMDs) in this manner, MMD analysis is 
carried out. In this analysis, MMDs are Mahalanobis distances obtained from 
5 VMDs. Table 17 and 18 provide sample values of MMDs for normals and 
abnormals respectively. 
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The next step is to assign the subsets to the columns of a suitable 
orthogonal array. Since there are eight subsets, Li 2 (2 n ) array was selected. 
The abnormal MMDs are computed for each run of this array. After 
performing average response analysis, gains in S/N ratios are computed for all 
5 the subsets. These details are shown in Table 19. 
Table 19: Gain in S/N ratios 





Level 1 


Level 2 


Gain 




15.498 


18.053 


-2.555 


s 2 


17.463 


16.089 


1.374 


s 3 


16.712 


16.839 


-0.127 


s 4 


15.925 


17.627 


-1 .702 


S 5 


17.626 


15.926 


1.700 


s 6 


1 7.243 


16.309 


0.934 


s 7 


15.683 


17.869 


-2.186 


S 8 


18.556 


14.996 


3.560 



From this table it is clear that S 8 has highest gain indicating that this is very 
important subset. It should be noted that the variables in this subset are same 
as the useful variables obtained from MTS method. This example is a simple 
10 case where we have only 17 variables and therefore here, MMD method may 
not be necessary. However, in complex cases, with several hundreds of 
variables, MMD method is more appropriate and reliable. 
Example 4 

In order to demonstrate the applicability of Gram-Schmidt process to 
15 predict abnormal conditions without computing the Mahalanobis distances, it is 
applied to the medical diagnosis case example previously discussed with 17 
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abnormal conditions. Out of 17 conditions, the first ten conditions are 
considered mild and the remaining seven conditions are considered as medium. 
This judgment was made by Dr. Kanetaka, who is a liver disease diagnosis 
specialist in Japan. For the purposes of prediction and since true values of 
severity are unknown, a value of 3 is assigned for the mild group and a value of 
9 is assigned for the medium group. Table 20 provides the summary of data 
analysis for abnormals in this case example generated by GSP. Figure 5 shows 
that there is a good match between actual level of severity and predicted 
values. 

Intentionally, two variables with zero standard deviations are 
introduced. These variables are considered as the first and second variables 
and now the total number of variables is 19. Table 21 provides the summary of 
data analysis for abnormals in this instance. Like the data of Figure 5, there is 
a good match between actual level of severity and predicted values as shown in 
Figure 6. 
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Publications mentioned in the specification are indicative of the levels 
of those skilled in the art to which the invention pertains. These publications 
are incorporated herein by reference to the same extent as if each individual 
publication was specifically and individually incorporated herein by reference. 
5 The foregoing description is illustrative of particular embodiments of 

the invention, but is not meant to be a limitation upon the practice thereof. The 
following claims, including all equivalents thereof, are intended to define the 
scope of the invention. 
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